<insecure_file_uploads_guide>
<title>INSECURE FILE UPLOADS</title>

<critical>Upload surfaces are high risk: server-side execution (RCE), stored XSS, malware distribution, storage takeover, and DoS. Modern stacks mix direct-to-cloud uploads, background processors, and CDNs—authorization and validation must hold across every step.</critical>

<scope>
- Web/mobile/API uploads, direct-to-cloud (S3/GCS/Azure) presigned flows, resumable/multipart protocols (tus, S3 MPU)
- Image/document/media pipelines (ImageMagick/GraphicsMagick, Ghostscript, ExifTool, PDF engines, office converters)
- Admin/bulk importers, archive uploads (zip/tar), report/template uploads, rich text with attachments
- Serving paths: app directly, object storage, CDN, email attachments, previews/thumbnails
</scope>

<methodology>
1. Map the pipeline: client → ingress (edge/app/gateway) → storage → processors (thumb, OCR, AV, CDR) → serving (app/storage/CDN). Note where validation and auth occur.
2. Identify allowed types, size limits, filename rules, storage keys, and who serves the content. Collect baseline uploads per type and capture resulting URLs and headers.
3. Exercise bypass families systematically: extension games, MIME/content-type, magic bytes, polyglots, metadata payloads, archive structure, chunk/finalize differentials.
4. Validate execution and rendering: can uploaded content execute on server or client? Confirm with minimal PoCs and headers analysis.
</methodology>

<discovery_techniques>
<surface_map>
- Endpoints/fields: upload, file, avatar, image, attachment, import, media, document, template
- Direct-to-cloud params: key, bucket, acl, Content-Type, Content-Disposition, x-amz-meta-*, cache-control
- Resumable APIs: create/init → upload/chunk → complete/finalize; check if metadata/headers can be altered late
- Background processors: thumbnails, PDF→image, virus scan queues; identify timing and status transitions
</surface_map>

<capability_probes>
- Small probe files of each claimed type; diff resulting Content-Type, Content-Disposition, and X-Content-Type-Options on download
- Magic bytes vs extension: JPEG/GIF/PNG headers; mismatches reveal reliance on extension or MIME sniffing
- SVG/HTML probe: do they render inline (text/html or image/svg+xml) or download (attachment)?
- Archive probe: simple zip with nested path traversal entries and symlinks to detect extraction rules
</capability_probes>
</discovery_techniques>

<detection_channels>
<server_execution>
- Web shell execution (language dependent), config/handler uploads (.htaccess, .user.ini, web.config) enabling execution
- Interpreter-side template/script evaluation during conversion (ImageMagick/Ghostscript/ExifTool)
</server_execution>

<client_execution>
- Stored XSS via SVG/HTML/JS if served inline without correct headers; PDF JavaScript; office macros in previewers
</client_execution>

<header_and_render>
- Missing X-Content-Type-Options: nosniff enabling browser sniff to script
- Content-Type reflection from upload vs server-set; Content-Disposition: inline vs attachment
</header_and_render>

<process_side_effects>
- AV/CDR race or absence; background job status allows access before scan completes; password-protected archives bypass scanning
</process_side_effects>
</detection_channels>

<core_payloads>
<web_shells_and_configs>
- PHP: GIF polyglot (starts with GIF89a) followed by <?php echo 1; ?>; place where PHP is executed
- .htaccess to map extensions to code (AddType/AddHandler); .user.ini (auto_prepend/append_file) for PHP-FPM
- ASP/JSP equivalents where supported; IIS web.config to enable script execution
</web_shells_and_configs>

<stored_xss>
- SVG with onload/onerror handlers served as image/svg+xml or text/html
- HTML file with script when served as text/html or sniffed due to missing nosniff
</stored_xss>

<mime_magic_polyglots>
- Double extensions: avatar.jpg.php, report.pdf.html; mixed casing: .pHp, .PhAr
- Magic-byte spoofing: valid JPEG header then embedded script; verify server uses content inspection, not extensions alone
</mime_magic_polyglots>

<archive_attacks>
- Zip Slip: entries with ../../ to escape extraction dir; symlink-in-zip pointing outside target; nested zips
- Zip bomb: extreme compression ratios (e.g., 42.zip) to exhaust resources in processors
</archive_attacks>

<toolchain_exploits>
- ImageMagick/GraphicsMagick legacy vectors (policy.xml may mitigate): crafted SVG/PS/EPS invoking external commands or reading files
- Ghostscript in PDF/PS with file operators (%pipe%)
- ExifTool metadata parsing bugs; overly large or crafted EXIF/IPTC/XMP fields
</toolchain_exploits>

<cloud_storage_vectors>
- S3/GCS presigned uploads: attacker controls Content-Type/Disposition; set text/html or image/svg+xml and inline rendering
- Public-read ACL or permissive bucket policies expose uploads broadly; object key injection via user-controlled path prefixes
- Signed URL reuse and stale URLs; serving directly from bucket without attachment + nosniff headers
</cloud_storage_vectors>
</core_payloads>

<advanced_techniques>
<resumable_multipart>
- Change metadata between init and complete (e.g., swap Content-Type/Disposition at finalize)
- Upload benign chunks, then swap last chunk or complete with different source if server trusts client-side digests only
</resumable_multipart>

<filename_and_path>
- Unicode homoglyphs, trailing dots/spaces, device names, reserved characters to bypass validators and filesystem rules
- Null-byte truncation on legacy stacks; overlong paths; case-insensitive collisions overwriting existing files
</filename_and_path>

<processing_races>
- Request file immediately after upload but before AV/CDR completes; or during derivative creation to get unprocessed content
- Trigger heavy conversions (large images, deep PDFs) to widen race windows
</processing_races>

<metadata_abuse>
- Oversized EXIF/XMP/IPTC blocks to trigger parser flaws; payloads in document properties of Office/PDF rendered by previewers
</metadata_abuse>

<header_manipulation>
- Force inline rendering with Content-Type + inline Content-Disposition; test browsers with and without nosniff
- Cache poisoning via CDN with keys missing Vary on Content-Type/Disposition
</header_manipulation>
</advanced_techniques>

<filter_bypasses>
<validation_gaps>
- Client-side only checks; relying on JS/MIME provided by browser; trusting multipart boundary part headers blindly
- Extension allowlists without server-side content inspection; magic-bytes only without full parsing
</validation_gaps>

<evasion_tricks>
- Double extensions, mixed case, hidden dotfiles, extra dots (file..png), long paths with allowed suffix
- Multipart name vs filename vs path discrepancies; duplicate parameters and late parameter precedence
</evasion_tricks>
</filter_bypasses>

<special_contexts>
<rich_text_editors>
- RTEs allow image/attachment uploads and embed links; verify sanitization and serving headers for embedded content
</rich_text_editors>

<mobile_clients>
- Mobile SDKs may send nonstandard MIME or metadata; servers sometimes trust client-side transformations or EXIF orientation
</mobile_clients>

<serverless_and_cdn>
- Direct-to-bucket uploads with Lambda/Workers post-processing; verify that security decisions are not delegated to frontends
- CDN caching of uploaded content; ensure correct cache keys and headers (attachment, nosniff)
</serverless_and_cdn>
</special_contexts>

<parser_hardening>
- Validate on server: strict allowlist by true type (parse enough to confirm), size caps, and structural checks (dimensions, page count)
- Strip active content: convert SVG→PNG; remove scripts/JS from PDF; disable macros; normalize EXIF; consider CDR for risky types
- Store outside web root; serve via application or signed, time-limited URLs with Content-Disposition: attachment and X-Content-Type-Options: nosniff
- For cloud: private buckets, per-request signed GET, enforce Content-Type/Disposition on GET responses from your app/gateway
- Disable execution in upload paths; ignore .htaccess/.user.ini; sanitize keys to prevent path injections; randomize filenames
- AV + CDR: scan synchronously when possible; quarantine until verdict; block password-protected archives or process in sandbox
</parser_hardening>

<validation>
1. Demonstrate execution or rendering of active content: web shell reachable, or SVG/HTML executing JS when viewed.
2. Show filter bypass: upload accepted despite restrictions (extension/MIME/magic mismatch) with evidence on retrieval.
3. Prove header weaknesses: inline rendering without nosniff or missing attachment; present exact response headers.
4. Show race or pipeline gap: access before AV/CDR; extraction outside intended directory; derivative creation from malicious input.
5. Provide reproducible steps: request/response for upload and subsequent access, with minimal PoCs.
</validation>

<false_positives>
- Upload stored but never served back; or always served as attachment with strict nosniff
- Converters run in locked-down sandboxes with no external IO and no script engines; no path traversal on archive extraction
- AV/CDR blocks the payload and quarantines; access before scan is impossible by design
</false_positives>

<impact>
- Remote code execution on application stack or media toolchain host
- Persistent cross-site scripting and session/token exfiltration via served uploads
- Malware distribution via public storage/CDN; brand/reputation damage
- Data loss or corruption via overwrite/zip slip; service degradation via zip bombs or oversized assets
</impact>

<pro_tips>
1. Keep PoCs minimal: tiny SVG/HTML for XSS, a single-line PHP/ASP where relevant, and benign magic-byte polyglots.
2. Always capture download response headers and final MIME from the server/CDN; that decides browser behavior.
3. Prefer transforming risky formats to safe renderings (SVG→PNG) rather than attempting complex sanitization.
4. In presigned flows, constrain all headers and object keys server-side; ignore client-supplied ACL and metadata.
5. For archives, extract in a chroot/jail with explicit allowlist; drop symlinks and reject traversal.
6. Test finalize/complete steps in resumable flows; many validations only run on init, not at completion.
7. Verify background processors with EICAR and tiny polyglots; ensure quarantine gates access until safe.
8. When you cannot get execution, aim for stored XSS or header-driven script execution; both are impactful.
9. Validate that CDNs honor attachment/nosniff and do not override Content-Type/Disposition.
10. Document full pipeline behavior per asset type; defenses must match actual processors and serving paths.
</pro_tips>

<remember>Secure uploads are a pipeline property. Enforce strict type, size, and header controls; transform or strip active content; never execute or inline-render untrusted uploads; and keep storage private with controlled, signed access.</remember>
</insecure_file_uploads_guide>
